How the DNA sequence affects the Hill curve of transcriptional response 
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The Hill coefficient is often used as a direct measure of the cooperativity of binding processes. It 
is an essential tool for probing properties of reactions in many biochemical systems. Here we analyze 
existing experimental data and demonstrate that the Hill coefficient characterizing the binding of 
transcription factors to their cognate sites can in fact be larger than one - the standard indication 
of cooperativity - even in the absence of any standard cooperative binding mechanism. By studying 
the problem analytically, we demonstrate that this effect occurs due to the disordered binding energy 
of the transcription factor to the DNA molecule and the steric interactions between the different 
copies of the transcription factor. We show that the enhanced Hill coefficient implies a significant 
reduction in the number of copies of the transcription factors which is needed to occupy a cognate 
site and, in many cases, can explain existing estimates for numbers of the transcription factors in 
cells. The mechanism is general and should be applicable to other biological recognition processes. 

PACS numbers: 



Molecular recognition plays an important role in bi- 
ological systems ranging from antigen-antibody identifi- 
cation to protein-protein binding In many cases the 
recognition process is driven by free-energy differences 
between a desired reaction and many competing unde- 
sired reactions ■ One example, of particular impor- 
tance, is that of protein-DNA interactions. Its role in un- 
derstanding regulation in cells has led to large experimen- 
tal effort to which aims at mapping the binding energy 
between transcription factors (TFs) in their specific state 
and different subsequences on the DNA Q. It is known 
that to a good approximation the energy can be written 
as a sum of energies representing the binding energy of a 
nucleotide on the DNA to the region on the protein with 
which it is aligned 0, Q ■ Specifically, the binding energy 
of a nucleotide s = A,C,G,T to position j = 1,2, ...,L 
(where L is the length of the protein's DNA binding do- 
main in units of basepairs) is usually described by a 4 x i 
position weight matrix (PWM), eg.j. By now, the PWM 
is known for many proteins and, together with a knowl- 
edge of the genomic sequence, it specifies the binding 
energy landscape of TFs to the DNA. 

Irrespective of the energy landscape properties the ac- 
tivation of a cognate site — a specific location on the 
DNA — by a TF is usually described by a Hill curve Q. 
Namely, if we consider a DNA molecular inside a con- 
tainer representing, say, a prokaryotic cell the activation 
probability of an operator by a TF is given by. 
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Here m is the number of TFs in the cell and at to = mi/2 
the occupation probability is one half (the conversion to 
concentrations is trivial). The Hill coefficient (HC), n, 
governs the steepness of the curve and is widely used to 
extract qualitative information about the regulation of 



genes from experimental data [ToUTst . In the simplest 
cases, when there is no cooperative binding involved one 
expects n = 1. In the presence of cooperative interac- 
tions n is different than one. For example, in the case 
of activation by dimers one expects n = 2 if ?7i is the 
number of monomers. 

In this article we demonstrate that this simple intuitive 
picture for the Hill curve can fail. This is a direct conse- 
quence of a non-trivial combination of variations in the 
binding energy of TFs to different sites along the DNA 
and the steric repulsion between them. This leads to (a) 
a disorder enhanced Hill coefficient which is larger than 
one even in the absence of any cooperative binding to the 
operator, and (b) a dramatic increase in the occupation 
probability of the cognate site as compared to a system 
with no steric interactions between the TFs or a constant 
non-cognate binding energy. Importantly, we show that 
the results are essential for explaining the number of TFs 
found in cells. 



I. RESULTS 

The Hill curve, Eq. ([1]), is directly related to a for- 
mulation of the problem using statistical mechanics and 
the knowledge of the experimentally measured binding 
energy landscape of the TF to the DNA. To illustrate 
this we first focus on a simple case where: (i) There is 
no cooperativity associated with the structure of the TF 
or its binding properties, such that one would naively 
expects n = 1. (ii) The probability of the TF to be off 
the DNA or in a non-specific conformation on the DNA 
is negligible. Note that by a non-specific conformation 
we mean one where the TF is on the DNA but does 
not interact with the bases. This conformation, which 
typically occurs due to electrostatic interactions, exists 
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FIG. 1: The value of mi/ 2 is presented as a function of the 
energy of different cognate sequences, Et for (a) LexA, (b) 
RpoN and (c) Lrp. The dotted hues are based on a numeri- 
cal calculation using the E. coli genome. The circles represent 
energies calculated from all known cognate sites of the TF. 
The black solid lines are based on the freezing regime approx- 
imation, Eq. p3p . while the red dashed lines are based on 
the non-steric approximation, Eq. ((8|. Filled, grey horizon- 
tal areas show the typical range of the TF's copy number in 
E. coli. The symbol rric in (c) marks the crossover value of the 
Lrp TF between the uncrowded and crowded regimes, pre- 
dicted by Eqs. (gT} and (O. (d) The HC, obtained by a 
fit of the numerical data to Eq. [T] is shown as a function of 
the cognate site energy for the LexA TF. The solid line is the 
analytical prediction, Eq. (|14p . while the circles represent nu- 
merical data based on real DNA and cognate sites sequence. 
The filled circle represents the HC of a hypothetical cognate 
site with a perfect consensus sequence. The dashed line rep- 
resents the result of the non-steric approximation, Eq. 
which gives n = 1. 



on any location along the DNA, including the cognate 
site. The effects of both simplifications are discussed 
in the Supplementary Information (SI). Using standard 
statistical mechanics it is straightforward to calculate 
mi/2 and ^^(m) numerically (see Methods). We use the 
PRODORIC database for PWMs of E.coli TPs [Hi, their 
cognate sequences and the genomic sequence of E. coli of 
length N = 2 X 4686077 J^. In the next section we 
discuss the results of this approach and show that it can 
lead to rather counterintuitive Hill curves. We then give 
a simple theory which accounts for the results analyti- 
cally and gives precise conditions for these effects to be 
important. 



A. E.coli TFs Hill curves 

Pirst, we evaluate numerically for all known cog- 
nate sites of three TPs. The results are shown in Pig. 
[T] (for the moment focus on the data shown as circles in 



panels a, b and c). As can be seen (and intuitively clear), 
the weaker the binding energy of the cognate site (larger 
Et) the larger TOi/2- The values of TOi/2 span a large 
range for different TPs and for different cognate sites of 
the same TP. Next, in Pigs. [2] and [3] {b and c panels) 
we present the occupation probability of typical cognate 
sites for two representative TPs, one cognate site of LexA 
and two of RpoN. In addition we fit the results to a Hill 
curve. With the value of mi/2 given the only fit param- 
eter is the HC, n. Surprisingly, for the three cases (and 
others not shown) we obtain n > 1 (in Pigs. [2] and [3] we 
obtain n = 1.7, 8.3, 2.03), despite the absence of coop- 
erativity in the model between the TP copies apart from 
steric interactions. As shown below this occurs for many 
TPs. 

Remarkably, performing the same procedure but ig- 
noring the steric repulsion of the TPs everywhere apart 
from the cognate site one obtains qualitatively and quan- 
titatively distinct results (see Pigs. [1] [2] and [3]) . This is 
despite the fact that in most cases the ratio of the num- 
ber of TPs in the cell to the length of the DNA is very 
small. In fact, ignoring steric repulsion leads to a signifi- 
cant increasing in the value of mi/2 (in some cases much 
above the measured estimate of number of TPs in the 
cell). Moreover, this simplification always yields a HC 
of n = 1. Similarly, it is clear that without disorder in 
the binding energy of the TP to different sites along the 
DNA one would obtain n = 1. 

In sum, to account properly for the Hill curve of TPs 
binding to DNA one must (a) account for the disordered 
binding energy of the TP to different sites along thee 
DNA and (b) account for steric interactions between dif- 
ferent copies of the TPs. Without these the HC would 
be lower than the actual one (later we show that this is 
true even for cases where naively one expects a HC which 
is larger than one) and mi/2 would be much larger than 
the real one. Those are the main results of this paper. 
In what follows we explain the obtained results using an 
analytical approach. We show that the effect is generic 
and illustrate its importance for many TPs. 



B. Analytical solution 

As stated above, the binding energy to any sequence 
on the DNA is a sum of L = 10 — 30 independent vari- 
ables. Therefore, assuming a pseudorandom sequence on 
the non-cognate DNA, the probability distribution of the 
binding energy, Pr {Ei), is close to normal (see 0, [H| and 
Figs. (Ha) and[3i;a)): 



Pr (El) cx exp 
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Here we define the centre of the normal distribution as 
zero energy. The value of a characterizes the width of the 
disordered binding energy profile along the DNA and is 
usually in the range 2 — 8, where throughout the paper we 
measure energy in units of ksT with ks the Boltzmann 
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FIG. 2: (a) A histogram of the binding energies of the LexA TF to the E. coli genome. The red points represent numerical 
data while the line is based on a Gaussian fit with a = 5.76. The big filled circle marks the binding energy to the recQ operon 
sequence with Et = —20.6. (b) The occupation probability of the LexA protein to the recQ operon as a function of ?n. The 
circles represent numerical data based on the PMW of the protein and the E. coli DNA sequence. The dashed line is based on 
the non-steric approximation, Eq. ((3]), while the solid line on Eq. (|12|) . The solid arrow shows the crossover value (nonphysical 
in this case) of the protein copy number, rric = 0.65, predicted by Eqs. (|27|l . and (|30[l . (c) The same data as shown in panel 
(b) with a Hill function, Eq. ([T}, fit (thin line) to the numerical data (circles) with n — 1.7 and mi/2 ~ 3000. The gray areas 
in (b) and (c) show the typical range of the LexA copy number in E.coli (200 — 4000) [T^ . 
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FIG. 3: (a) A histogram of the binding energy of the RpoN TF to the E. coli genome. The red points represent the numerical 
data while the line is based on a Gaussian fit with a = 7.8. The circle shows the binding energy to the argT operon with 
Et = —41.8. The square shows the binding energy to the fixABCX operon with Et ~ —28.9. (b) The occupation probability 
of the RpoN protein for the operons argT (circles and 1 symbol) and fixABCX (squares and 2 symbol) as a function of m. 
The circles and squares represent numerical data based on the PWMs of the TPs and the E. coli DNA sequence. The dashed 
(dotted) line is based on the non-steric approximation, Eq. ([2}, for the argT (fixABCX) operon, while the solid lines are based 
on Eq. (|12p . (c) The same numerical data as in panel (b) with a fit of a Hill function, Eq. (thin, solid, black lines) to the 
numerical data with n — 8.3 and mi/2 ~ 3.3 for the argT operon and with n — 2.07 and mi/2 = 1300 for the fixABCX operon. 
Vertical areas in panels (b) and (c) show the typical number of the RpoN proteins in E.coli (~ 110) [l5| . 



constant and T the temperature. The validity of this 
approximation is illustration in Figs. [U^a) and[3Ja) where 
we plot the histograms of the binding energies of two 
typical TFs on the E.coli DNA. 



the cognate site is given by 

Tjnon—steric 

rrp — - 



(3) 



where Et is the energy of the operator of interest. Thus, 
by comparing with Eq. ([T]) one has 



Before presenting a full analysis of the problem it is 
interesting to consider a case where the different copies 
of the TFs have steric interactions only on the cognate 
site. We refer to this as the non-steric approximation. 
As shown in the SI, here the occupation probability of In particular, this implies that without steric interactions 
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FIG. 4: An illustration of the crowding effect in an energy 
landscapes with high disorder. Typical disordered energy pro- 
files, with small and high disorder strength, are plotted in the 
(a) and (b) panels respectively. The circles represent TPs 
who's number is much smaller than the possible binding sites 
on the DNA. Therefore, in the small disorder case, (a), they 
spread evenly across the DNA. When the disorder strength 
is high (panel (b)) the TPs compete on a limited number of 
traps. Then steric interaction between those competing on 
these small number of traps become important. As the traps 
fill with TPs the occupation probability of the cognate site 
increases dramatically. This leads to a reduction in mi/2 and 
an enhanced HC. 



We note that the disorder width, a, has different values 
for different TFs with typical values in the range of 2 — 8. 
Thus, the crossover value of the protein's copy number 
varies from a non-physical 10~^ (where the non-steric 
approximation surely fails) to 10^ where the non-steric 
approximation is expected to hold unless the number of 
TFs is extremely high. 

The above derivation implies that the non-steric ap- 
proximation (valid in the uncrowded regime) will give a 
good estimation of the HC and the number of particles 
at half occupation. 



mi/2 = Ne 
n = 1, 



(8) 
(9) 



only when TOi/2 <C rric. Furthermore, to be in the self 
averaging regime near m ~ 1^1/2 the cognate site of in- 
terest, with energy Et, has to satisfy the condition 



(10) 



b. Crowded regime: In this regime, illustrated in 
Fig. Hb), 



mc = Ne 2 _ 
and the occupation probability is given by 



(11) 



there is no enhanced HC and one obtains the naive re- 
sults. 

From the numerical experiment presented before, 
clearly, the non-steric approximation can fail. In addi- 
tion to giving a wrong value for n it gives, in some cases, 
unreasonable values of mi/2. Indeed, in Figs. [2] and |3] we 
show numerical results for LexA compared with Eq. ^ . 
The values of mi/2 differ by more than two orders of mag- 
nitude and as stated above the value of n is larger than 
one. Moreover, the number of LexA TFs in the cell is 
estimated to be between 200 to 4000, a value much lower 
than the mi/2 ~ 10^ found in the non-steric approxi- 
mation. A similar disagreement between the numerical 
results and the non-steric approximation occurs for RpoN 
as well as for many other TFs. 

To analyze the problem more carefully, taking into 
account steric interactions, we calculate the occupation 
probability of the cognate site in two limiting cases (see 
Methods and SI): 

a. Uncrowded regime: In this regime, illustrated in 
Fig. [H^a) , the number of TFs is smaller than a crossover 
value defined by the DNA length and the disorder width: 
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(6) 



The occupation probability in this regime is given by the 
non-steric approximation, Eq. ([3|), so that: 
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(12) 

If close to saturation of the cognate site the system is in 
the crowded regime, toi/2 3> rric- Then comparing Eqs. 
and (HI) one can see that 



TV 



VI + 2fT2 

1 + W {e-'^'^a'^) 
^2 (e-ETa2) 



(13) 
(14) 



where W is the Lambert VF- function [l^. The function is 
well approximated by W{X) ~ ln(X) for large X. Note 
that to be in the crowded regime close to saturation of 
the cognate site the condition 



Et > 



(15) 



has to be satisfied. 

To check our analytical results we plot both the non- 
steric approximation and the results from the crowded 
regime in Figs. [2] and [31 The non-steric approximation, 
([3]), clearly fails. In contrast Eq. agrees well with 

the numerical data (strictly the uncrowded regime result 

gives a good approximation below rric = Ne~~). In 
particular, the HC of the numerical data in the analyzed 
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FIG. 5: A classification of cognate sites of different TFs. For eacli TF (y-axis) Et — (— o"^) is plotted for all its known cognate 
sites (taken from [T^). Empty rectangles represent a hypothetical cognate site's energy that is half-occupied if the number of 
TFs is as specified in the database (see legend). For TFs with m < rric filled red areas represent the same hypothetical 
energy, as predicted by Eqs. When the typical number of the TFs in the cell is larger than rric the hypothetical cognate 
sites' energies calculated from the non-steric approximation (|4]) are presented by the red filled area while the results of applying 
Eq. p3|l are presented by the blue filled areas. 



cases is clearly above one and depends on the disorder 
width and the cognate site's energy, as demonstrated by 
Eq. (HID. 

It is interesting to note that close to saturation the Hill 
curve (determined by the value of mi/2 and the steepness 
of Pt) is well described by Eqs. dS]) , (|3 , ((13]) and as 
presented in Fig. [T] However, in some cases there is 
a disagreement between the theory and the numerical 
results for small values of the protein copy number (see 
the small m values in Fig. [3{b) and (c) and the low 
Et values in Fig. [Hb) and (d)). This disagreement is 
due to the deviation of the binding energy probability 
density from a normal distribution at the low energy tail 
(see Figs. [2{a) and[31[a)). This effect, which is easy to 
calculate numerically, increases both the value of mi/2 
and n relative to the analytical predictions. As shown 
in Fig. [3]Ja) and (b) in some cases this effect may be 
significant and lead to a very large HC {n = 8.3 for the 
argT operon of RpoN). In all the presented cases the 
non-steric approximation clearly fails by several orders 
of magnitude. 

To examine the validity and relevance of these results 
to other TFs we use a database of 17 known PWMs of 
TFs of E. coli, chosen randomly, and their known binding 
sites' sequences [l^. Many of the TFs are present in large 
copy numbers, larger than their tUc value (see the legend 
in Fig. [S]). Therefore, the non-steric approximation of the 
occupation probability, ([3]), does not approximate well 
the occupation probability of many cognate sequences in 



the biologically relevant regime. In addition, as one can 
see in Fig. [SJ the value of Et + cr^ for many cognate sites 
of many proteins is positive. As suggested by Eq. ([T5|) , to 
describe the occupation probability of such cognate sites 
Eqs. ([7]), and have to be used, while the non-steric 
approximation, Eq. ([3]), fails. 

It is instructive to calculate the chemical potential or 
the value of a hypothetical cognate site energy, denoted 
by fi, which is half-occupied when the number of the TF, 
TO, is as it is measured in experiments. The results are 
shown in Fig. [S) One can clearly see that the value of /i 
is smaller (larger) than — if m is smaller (larger) than 
TOc, as suggested by conditions ([TU)) . and ([T5|). Moreover, 
the non-steric approximation predicts well the value of 
fi for TO < TOc but fails for to > rric predicting much 
lower values and, therefore predicting very small occupa- 
tion probability for all the cognate sites with an energy 
above fi. In contrast, Eqs. and (pijl predict well 

the location of the hypothetical, half-occupied energy for 
both weak and strong cognate sequences. 

II. DISCUSSION 

The considerations discussed in this paper suggest the 
existence of a disorder enhanced HC. They provide an 
estimate of the TF's copy number needed to significantly 
occupy its cognate sites. This estimate is shown in many 
cases to be much smaller and more consistent with the 



6 



existing data than a naive estimate, based on the non- 
steric approximation. By analyzing the disordered statis- 
tical mechanics problem analytically, we show that two 
regimes are possible. In the uncrowded regime the num- 
ber of TFs, m is much smaller than the crossover value 
rric and steric interactions can be ignored. In contrast, 
in the crowded regime, m 3> ?Tic, the steric interactions 
play an important role and change dramatically the satu- 
ration curve, both quantitatively and qualitatively. The 
crossover value, nic = Ne~° can be much smaller 
than the DNA length and in many cases is much smaller 
than the measured number of the TF in the cell. In 
addition, for weak cognate sites with a binding energy, 
Et > — cr^, the HC is always greater than one, even in 
the absence of cooperative binding, and in principle un- 
bounded from above. 

To understand intuitively this effect we note that in the 
high disorder regime (where the non-steric approximation 
fails) the partition function of the system is dominated by 
a small number of sites with a low energy Therefore, 
any TF added to the system will immediately bind to 
one of these low energy sites. Their number can be much 
smaller than the total number of sites and comparable 
to the number of TFs. With this in mind it is clear 
that steric interactions play an important role in this 
regime and change quantitatively and qualitatively the 
saturation curve, including a disorder induced HC (see 
Fig. [3]for illustration). 

In the SI we show that this effect persists as long as 
the proteins spend most of their time in a specific state 
on the DNA. When this is not the case the value of mi/2 
grows dramatically making this a rather costly option for 
many TFs j2l|. In fact, as we show for many of the TFs 
we consider, data on TF numbers in E. coli suggests 
a value of TOi/2 which does not agree with the partition 
function which is not dominated by specific conforma- 
tions of the TF on the DNA. Finally, we show that when 
naive considerations give a HC of n the disorder will lead 
to a HC which is given by n • n, with n the disorder en- 
hanced HC obtained when there is no cooperative binding 
(see SI). 

The above results are summarized in Fig. [5] where we 
plot for several TFs and all their known cognate sites the 
value of Et + cr^ . The regime Et + cr^ smaller / greater 
than zero corresponds to a without/with disorder en- 
hanced HC. In addition we calculate using data on TF 
numbers the value of Et+ct'^ which would yield Pt = 1/2 
with the typical estimated number of TFs in E. coli. As 
can be seen the numbers indicate that a disorder induced 
HC is likely for a significant fraction of the TFs and their 
cognate sites. This happens since, as shown in the leg- 
end of Fig. [SJ many TFs are present in numbers much 
larger than rric and, therefore, are located in the crowded 
regime. This study was performed using the measured 
PWMs of several TFs and can easily be extended to oth- 
ers. 
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III. METHODS 

For a cognate site with energy Et the occupation prob- 
ability is given by 



The chemical potential, fi, is set by the solution of the 
equation 

^ 1 

i—l 

Here Ei is the binding energy of the TF at location 
i = 1,2,...A^, with N the number of accessible DNA 
binding sites, on the DNA. We assume that different 
copies of the TF exhibit steric interactions, resulting in 
Fermi-Dirac statistics in Eq. ((T7|). Ens is the free energy 
associated with a TF in the solution or in nonspecific 
conformations on the DNA 0, H, [H, and, for now, 
under our assumption (ii) is negligible. In what follows 
it will be useful to bear in mind that a non- negligible con- 
tribution can only reduce the value of Pt and enhance 
mi/2- We discuss its possible contribution in SI. To ob- 
tain the occupation probability of the cognate sites of the 
analyzed TFs we solve numerically Eqs. and ([T7|) . 

IV. SUPPLEMENTARY INFORMATION 

The analysis in the article relied on (i) no inherent co- 
oper ativity in the TF, and (ii) a negligible probability to 
be in non-specific states either on or off the DNA. We now 
turn to discuss the influence relieving these assumptions. 

A. Effects of cooperativity: 

In our study, so far, we ignored the possibility of coop- 
erative effects between distinct molecules of the TF. How- 
ever, many TFs are active only in their n-meric form. In 
this case the HC is usually expected to be close to n since 
the number of active molecules of the TF is proportional 
to the fi'th power of the concentration of monomers 
One can reformulate the above derivation with m acting 
as the number of active, fi-meric copies of the TF. In this 
case the HC according to the arguments given above is 
n ■ n. Thus, a combination of cooperativity and quenched 
disorder can naturally lead to a high values of the HC. 
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B. The effect of nonspecific states: 

In our study, so far, we ignored nonspecific states of the 
protein. These states exist and correspond either to the 
TF being in the solution or in a nonspecific conformation 
bound to the DNA 0, H, [H, [2|| . Namely, the nonspecific 
free energy is given by 



In I 



'E3D 
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where E^d is the free energy of an unbound TF (mod- 
eled say by the free-energy of a solution of TFs) and E2 
is the energy of the bound protein in the nonspecific con- 
formation which for simplicity we assume to have the 
same energy for all sites along the DNA (the results are 
unchanged, being proportional to N, in the presence of 
small disorder in this binding energy with a slight rein- 
terpretation of £'2)- Clearly, these nonspecific states can 
only reduce the occupation probabilities of the cognate 
sites calculated in Eqs. ([7]) and ([T^ . However, as sug- 
gested previously the existence of these nonspecific states 
can be an important component for the dynamics of the 
search and recognition process carried out by the TF 

From Eq. (jl7p one can see that the nonspecific energy 
is negligible when 



Ens > + In i 



(19) 



Otherwise a significant fraction of the TFs are in the 
nonspecific state so that the chemical potential is well 
approximated by Ens- Thus, the occupation probability 
of the cognate site may be approximated by 



Pt = min 



1 



1 



1 



(20) 



where ^. is given by Eqs. (O, and In Fig. [S]we show 
that indeed this results agree well with the numerical 
calculations for different values of the nonspecific energy. 
Thus, condition ([T^ implies that a half occupation of the 
cognate site occurs in the freezing regime (so that n> 1) 
if 



Ens > St + lnm 



1/2- 



(21) 



Since we are not aware of direct measurements of the 
nonspecific binding it is hard to estimate if this condition 
holds. We do stress, however, that the experimentally 
measured numbers of TFs in the cell suggest that it does 
not play an important role for many TFs. 

In sum, the nonspecific states may be ignored if most 
of the TFs are mostly bound to the DNA in a specific 
conformation. Otherwise the chemical potential takes 
a value which is closer to the nonspecific free energy. 
This causes the occupation probability to be well approx- 
imated by the non-steric approximation which results in 
a Hill curve with n = 1 and TO1/2 = e^^~^"=. However, 
in many cases the free energy of the nonspecific states 
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FIG. 6: The occupation probability of the LexA TF to one of 
its cognate sequences, the recQ operon, is presented for dif- 
ferent values of the nonspecific energy. Different symbols rep- 
resent the numerical data (see legend) while the solid curves 
are based on Eq. (|20|) . 



which is needed to bring the system to the self averaging 
regime is very low. Then the cognate site is significantly 
occupied only if the number of TFs is much larger than 
its typical value in the cell. In Fig. [7] the parameters of 
the Hill curve are shown for different values of Ens ■ One 
can see that to decrease the HC to one, the value of mi/2 
has to be much larger than the typical copy number of 
the protein. Note, that even in this case the occupation 
probability might scale as m" with n > 1 far from the 
saturation regime, nic to <C exp {Ens ~ fJ-) (see Fig. [6l 
for an example). In fact the existence of the nonspecific 
states can totally eliminate the disorder induced HC for 
all values of m only if the condition Ens <C /x -|- In TOc is 
satisfied. 



C. Analytical solution 

Assuming a pseudorandom DNA sequence implies that 
Eq. (fT7| is well approximated, for iV ^ 1, by 



-^^dE . , (22) 



1 



where the binding energy probability density is given by 
Eq. and the angular brackets are defined by the in- 
tegral. 

First we discuss the non-steric approximation. In this 
case the Boltzmann statistics takes place of the Fermi- 
Dirac one everywhere except from the cognate site. Since 
occupation of the target site is much smaller than to it 
can be neglected and the constraint on the chemical po- 
tential is given by 



e 2 



TO 

iV 



(23) 
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FIG. 7: The parameters of the fitted Hill curve, Eq. (IT|, are 
presented for the occupation probability of the LexA TF to 
one of its cognate sequences, the recQ operon as a function of 
the nonspecific free energy. The circles represent mi/ 2 while 
the squares represent n. The filled gray horizontal area shows 
the ty pica l range of the LexA copy number in E.coli (200 
4000) 



• pica 

B 



Substituting the solution for ^ in Eq. one gets the 

non-steric approximation ([3]). 

As is shown in the main text the non-steric approxi- 
mation clearly fails. We turn now evaluate the integral 
in Eq. (|22p using a saddle point approximation. The 
resulting set of equations are 



ln(---l 



(24) 



and 



m 

TV' 



where E^, is the value of E at the saddle point. Eq. ([^5)) 
may be solved self-consistently in two limiting cases: 

c. Uncrowded regime: In this regime the saddle 
point occurs at E^ ~ — c^, so that Eq. (P5)) implies 



2/1 "^.<tV2 



(26) 



Self-consistency in this regime requires 

m <^ mc = Ne ~ . (27) 
The chemical potential (fM)) in this limit is given by 

(28) 



a = In 

^ 2 \ m 



and the occupation probability is given by Eq. ([7|). 

d. Crowded regime: In this regime the saddle point 
satisfies ^ ct^ and Eq. ([25)) implies 



E, = -crW21n 



N 



(29) 



Self-consistency requires 

m > 771c = A^e ~ . (30) 
The chemical potential in this limit is then given by 



/i = —aJ2 In 



N 



iVl + 20-2 



-In 



(25) and the occupation probability by Eq. ([T^. 



- 1 
(31) 
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